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Abstract 

Cognitive assessment is a growing area in psychological and educational measure- 
ment, where tests are given to assess mastery /deficiency of attributes or skills. A key 
issue is the correct identification of attributes associated with items in a test. In this 
paper, we set up a mathematical framework under which theoretical properties may 
be discussed. We establish sufficient conditions to ensure that the attributes required 
by each item are learnable from the data. 

1 Introduction 

Cognitive diagnosis has recently gained prominence in educational assessment, psychiatric 
evaluation, and many other disciplines. A key task is the correct specification of item- 
attribute relationships. A widely used mathematical formulation is the well known Q-matrix 
introduced by [26]. A short list of further developments of cognitive diagnosis models (CDMs) 
based on the Q- matrix includes the rule space method [271 128] , the reparameterized uni- 
fied/fusion model (RUM) [51 [7J [29] , the conjunctive(noncompensatory) DINA and NIDA 
models [121 [23 SI EDI E] , the compensatory DINO and NIDO models jSTJ [30], the attribute 
hierarchy method [13], and clustering methods [I]; see also [11] and [22] for more approaches 
to cognitive diagnosis. 

Under the setting of the Q-matrix, a typical modeling approach assumes a latent variable 
structure in which each subject possesses a vector of k attributes and responds to m items. 
The so-called Q-matrix is an m x k binary matrix establishing the relationship between 
responses and attributes by indicating the required attributes for each item. The entry in 
the i-th row and j-th column indicates if item i requires attribute j. Statistical analysis 
with such models typically assumes a known Q-matrix provided by experts such as those 
who developed the questions [191 EH EIE1 El]. Such a priori knowledge when correct is 
certainly very helpful for both model estimation and eventually identification of subjects' 
latent attributes. On the other hand, model fitting is usually sensitive to the choice of 
Q-matrix and its misspecification could seriously affect the goodness of fit. This is one of 
the main sources for lack of fit. Various diagnostic tools and testing procedures have been 
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developed [201 13 El [HJ IE]- A recent review of diagnostic classification models is given by 

I2U- 

Despite the importance of the Q-matrix in cognitive diagnosis, its estimation problem is 
largely an unexplored area. Unlike typical inference problems, the inference for the Q-matrix 
is particularly challenging for the following reasons. First, in many cases, the Q- matrix is 
simply nonidentifiable. One typical situation is that multiple Q-matrices lead to an iden- 
tical response distribution. Therefore, we only expect to identify the Q-matrix up to some 
equivalence relation (Definition [2]). In other words, two Q- matrices in the same equivalence 
class are not distinguishable based on data. Our first task is to define a meaningful and 
identifiable equivalence class. Second, the Q-matrix lives on a discrete space - the set of 
m x k matrices with binary entries. This discrete nature makes analysis particularly difficult 
because calculus tools are not applicable. In fact, most analyses are combinatorics based. 
Third, the model makes explicit distributional assumptions on the (unobserved) attributes, 
dictating the law of observed responses. The dependence of responses on attributes via Q- 
matrix is a highly nonlinear discrete function. The nonlinearity also adds to the difficulty of 
the analysis. 

The primary purpose of this paper is to provide theoretical analyses on the learnability of 
the underlying Q-matrix. In particular, we obtain definitive answers to the ident inability of 
Q-matrix for one of the most commonly used models - the DINA model - by specifying a set 
of sufficient conditions under which the Q-matrix is identifiable up to an explicitly defined 
equivalence class. We also present the corresponding consistent estimators. We believe that 
the results (especially the intermediate results) and analysis strategies can be extended to 
other conjunctive models [151 [121 EH EH fTTj . 

The rest of this paper is organized as follows. In Section [21 we present the basic inference 
result for Q-matrices in a conjunctive model with no slipping or guessing. In addition, we 
introduce all the necessary terminologies and technical conditions. In Section [31 we extend 
the results in Section [2] to the DINA model with known slipping and guessing parameters. 
In Section HI we further generalize the results to the DINA model with unknown slipping 
parameters. Further discussion is provided in Section Proofs are given in Section [HI 
Lastly, the proofs of two key propositions are given in Appendix |A] 

2 Model specifications and basic results 

We start the discussion with a simplified situation, under which the responses depend on 
the attribute profile deterministically (with no uncertainty). We describe our estimation 
procedure under this simple scenario. The results for the general cases are given in Sections 
[3]andH 

2.1 Basic model specifications 

The model specifications consist of the following concepts. 

Attributes: subject's (unobserved) mastery of certain skills. Consider that there are k 
attributes. Let A = (A 1 , A k ) T be the vector of attributes and A^ e {0, 1} be the indicator 
of the presence or absence of the j'-th attribute. 
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Responses: subject's binary responses to items. Consider that there are m items. Let 
R = (R 1 , ...,R m ) T be the vector of responses and R l G {0, 1} be the response to the i-th. 
item. 

Both A and R are subject specific. We assume that the integers m and k are known. 
Q-matrix: the link between item responses and attributes. We define an m x k matrix 
Q — {Qij)mxk- For each i and j, Qij = 1 when item i requires attribute j and otherwise. 
Furthermore we define 

k 

c = n(^) Qy = i(^ > ■■ j = i, k), (i) 

which indicates whether a subject with attribute A is capable of providing a positive response 
to item i. This model is conjunctive, meaning that it is necessary and sufficient to master all 
the necessary skills to be capable of solving one problem. Possessing additional attributes 
does not compensate for the absence of necessary attributes. In this section, we consider the 
simplest situation that there is no uncertainty in the response, that is, 

R} = C (2) 

for i — 1, ...,m. Therefore, the responses are completely determined by the attributes. We 
assume that all items require at least one attribute. Equivalently, the Q-matrix does not have 
zero row vectors. Subjects who do not possess any attribute are not capable of responding 
positively to any item. 

We use subscripts to indicate different subjects. For instance, R r = (Rl, ...,-R™) T is the 
response vector of subject r. Similarly, A r is the attribute vector of subject r. We observe 
Ri, Rjv, where we use N to denote sample size. The attributes A r are not observed. Our 
objective is to make inference on the Q-matrix based on the observed responses. 

2.2 Estimation of Q-matrix 

We first introduce a few quantities for the presentation of an estimator. 

T-matrix. In order to provide an estimator of Q, we first introduce one central quantity, 
the T-matrix, which connects the Q-matrix with the response and attribute distributions. 
Matrix T(Q) has 2 k — 1 columns each of which corresponds to one nonzero attribute vector, 
A e {0, l} fc \{(0, 0)}. Instead of labeling the columns of T(Q) by ordinal numbers, we 
label them by binary vectors of length k. For instance, the A-th column of T(Q) is the 
column that corresponds to attribute A, for all A ^ (0, 0). 

Let Ii be a generic notation for positive responses to item i. Let "A" stand for "and" 
combination. For instance, 1^ A L l2 denotes positive responses to both items %\ and %i. Each 
row of T[ff) corresponds to one item or one "and" combination of items, for instance, J^, 
Ijj Aii 2 , or Ijj A/j 2 A Ii 3 , ... If Tiff) contains all the single items and all "and" combinations, 
T(Q) contains 2 m — 1 rows. We will later say that such a T{Q) is saturated (Definition [T] in 
Section [23D. 

We now describe each row vector of T(Q). We define that Bq(Ii) is a 2 k — 1 dimensional 
row vector. Using the same labeling system as that of the columns of T{Q), the A-th element 
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of Bg(Ii) is defined as Ylj^ii^)®^ , which indicates if a subject with attribute A is able to 
solve item i. 

Using a similar notation, we define that 

B g (J il A...A/ il ) = Tj k=1 J3g(J 4h ), (3) 

where the operator " T l h=l " is element-by-element multiplication from Bq^I^) to Bq^I^). 
For instance, 

W = T l h=1 V h 

means that W j = nLi V L where w = ( w \ W 2 "' 1 ) and V h = (V£, Vf" 1 ). Therefore, 
Bq^I^ A ... A Jjj) is the vector indicating the attributes that are capable of responding 
positively to items ii, i\. The row in T(Q) corresponding to 1^ A... A 1^ is Bq^I^ A ... Alj,). 

a- vector. We let a be a column vector the length of which equals to the number of 
rows of T(Q). Each element of a corresponds to one row vector of T(Q). The element in a 
corresponding to l ix A... f\I- H is defined as N IiiA ^ Al .jN, where Nj iiA „_ A i. denotes the number 
of people who have positive responses to items ii, ii, that is 

N 

JV /liA ... A/li =^/(i^ = l:j = l, ...,/). 

r=l 

For each A e {0, l} fc , we let 

1 - 

PA = ^J2^ A r = A )- ( 4 ) 

r=l 

If (J2]) is strictly respected, then 

T(Q)p = a, (5) 

where p = (pa : A G {0, l} fe \{(0, ...,0)}) is arranged in the same order as the columns of 
T(Q). This is because each row of T(Q) indicates the attribute profiles corresponding to 
subjects capable of responding positively to that set of item(s). Vector p contains the pro- 
portions of subjects with each attribute profile. For each set of items, matrix multiplication 
sums up the proportions corresponding to each attribute profile capable of responding posi- 
tively to that set of items, giving us the total proportion of subjects who respond positively 
to the corresponding items. 

The estimator of the Q-matrix. For each m x k binary matrix Q', we define 

3(Q')= inf \T(Q')p-a\, (6) 

peio.i]^- 1 

where p = (pa : A ^ (0, •••,0)). The above minimization is subject to the constraint that 
Sa^(o o)£* a e [0' | - | is the Euclidean distance. An estimator of Q can be obtained by 
minimizing S(Q'), 

Q = arginfS(Q'), (7) 
w 

where "arg inf" is the minimizer of the minimization problem over all m x k binary matrices. 
Note that the minimizers are not unique. We will later prove that the minimizers are in the 
same meaningful equivalence class. Because of (jSJ), the true Q-matrix is always among the 
minimizers because S(Q) = 0. 
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2.3 Example 

We illustrate the above construction by one simple example. We emphasize that this example 
is discussed to explain the estimation procedure for a concrete and simple example. The 
proposed estimator is certainly able to handle much large Q-matrices. We consider the 
following 3x2 Q-matrix, 





addition 


multiplication 


2 + 3 


1 





5x2 





1 


(2 + 3) x 2 


1 


1 



Q= 1 + 6 1 u (8) 



There are two attributes and three items. We consider the contingency table of attributes, 



multiplication 

Poo Poi 

addition 

Pw Pn 



In the above table, poo is the proportional of people who do not master either addition or 
multiplication. Similarly, we define poi, pw, and pn- {Pij',j = 0, 1} is not observed. 

Just for illustration, we construct a simple T-matrix. Suppose the relationship in (j2J) is 
strictly respected. Then, we should be able to establish the following identities: 

N(p w + Pn) = N h , N(p 01 +p u ) = N h , Np u = N h . (9) 

Therefore, if we let p = (pw,poi,pii), the above display imposes three linear constraints on 
the vector p. Together with the natural constraint that Y2ijPij = 1> P solves linear equation, 

T(Q)p = a, (10) 

subject to the constraints that p G [0, l] 3 and pio + poi + Pn £ [0, 1], where 

/ 1 1 \ / N h /N \ 

T(Q)= Oil, a=\ N l2 /N . (11) 
\0 lj \NJNJ 

Each column of T(Q) corresponds to one attribute profile. The first column corresponds to 
A = (1,0), the second column to A = (0, 1), and the third column to A = (1,1). The first 
row corresponds to item 2 + 3, the second row to 5 x 2, and the last row to (2 + 3) x 2. For 
this particular situation, T(Q) has full rank and there exists one unique solution to (fit)]) . In 
fact, we would not expect the constrained solution to the linear equation in (I10p to always 
exist unless (j2J) is strictly followed. This is the topic of the next section. 

The identities in only consider the marginal rate of each question. There are additional 
constraints if one considers "combinations" among items. For instance, 



hAh- 



People who are able to solve problem 3 must have both attributes and therefore are able to 
solve both problems 1 and 2. Again, if (j2|) is not strictly followed, this is not necessarily 
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respected in the real data, though it is a logical conclusion. Upon considering J 1; J 2 , ^3, and 
1 1 A I2, the new T- matrix is 



/ 1 





1 \ 




( 


N h /N \ 





1 


1 


, a = 




NiJN 








1 




N h /N 


\o 





1 J 




V 


N hAh /N J 



T{Q) 



The last row is added corresponding to Ii A ia- With ([2]) in force, we have 

S(Q) = inf \T(Q)p-a\ = \T(Q)p-a\ = 0, 

pe[o,i] :! 



(12) 



(13) 



if Q is the true matrix. 
2.4 Basic results 

Before stating the main result, we provide a list of notations, which will be used in the 
discussions. 

• Linear space spanned by vectors Vi, Vf 



C(V 1 ,...,V l ) = ^£a J V r .a ] eR^ 



For a matrix M, M\-x denotes the submatrix containing the first / rows and all columns 
of M. 

Vector Ci denotes a column vector such that the i-th element is one and the rest are 
zero. When there is no ambiguity, we omit the length index of e,. 

Matrix T[ denotes the / x / identity matrix. 

For a matrix M, C(M) is the linear space generated by the column vectors of M. It 
is usually called the column space of M. 

Cm denotes the set of column vectors of M. 

Rm denotes the set of row vectors of M. 

Vector denotes the zero vector, (0, ...,0). When there is no ambiguity, we omit the 
index of length. 

Scalar p& denotes the probability that a subject has attribute profile A. For instance, 
Pio is the probability that a subject has attribute one but not attribute two. 

Define a 2 k — 1 dimensional vector 



p=(p A :Ae{0,l} fc \{0}). 
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• Let c and g be two m dimensional vectors. We write c >- g if Cj > ^ for all 1 < i < m. 

• We write c ^ g if q 7^ gi for all 2 = 1, m. 

• Matrix Q denotes the true matrix and Q' denotes a generic m x k binary matrix. 
The following definitions will be used in subsequent discussions. 

Definition 1 We say that T(Q) is saturated if all combinations of form 1^ A ... A 1^, for 
I = l,...,m, are included in T(Q). 

Definition 2 We write Q ~ Q' if and only if Q and Q' have identical column vectors, which 
could be arranged in different orders; otherwise, we write Q Q' . 

Remark 3 It is not hard to show that "~" is an equivalence relation. Q ~ Q' if and only 
if they are identical after an appropriate permutation of the columns. Each column of Q 
is interpreted as an attribute. Permuting the columns of Q is equivalent to relabeling the 
attributes. For Q ~ Q' , we are not able to distinguish Q from Q' based on data. 

Definition 4 A Q-matrix is said to be complete if {ei : i — 1, k} C Rq (Rq is the set of 
row vectors of Q); otherwise, we say that Q is incomplete. 

A Q-matrix is complete if and only if for each attribute there exists an item only requiring 
that attribute. Completeness implies that m> k. We will show that completeness is among 
the sufficient conditions to identify Q. 

Remark 5 one of the main objectives of cognitive assessment is to identify the subjects' 
attributes; see Wlf for other applications. The completeness of the Q-matrix is a necessary 
condition for a set of items to consistently identify attributes. Thus, it is always recommended 
to use a complete Q-matrix. For a precise formulation, see |7J/. 

Listed below are assumptions which will be used in subsequent development. 

CI Q is complete. 
C2 T{Q) is saturated. 

C3 Ai, Ajv are i.i.d. random vectors following distribution 

P(A r = A)=p* A ; 
We further let p* = (p* A : A E {0, 1}\{0}). 
C4 (p* ,p*)yO. 

With these preparations, we are ready to introduce the first theorem, the proof of which 
is given in Section |6j 
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Theorem 6 Assume that conditions C1-C4 are in force. Suppose that for subject r the 
response corresponding to item i follows 

k 

Let Q, defined in be a minimizer of S(Q') among allmxk binary matrices, where S(Q') 
is defined in (JSJ). Then, 

lim P(Q ~ Q) = 1. (14) 

JV->oo 

Further, let 

p = arginf |T(g)p-a| 2 . (15) 
p 

W^t/j an appropriate rearrangement of the columns of Q, for any e > 

lim P(|p-p*| < e) = 1. 

Remark 7 If Q± ~ g2, i/ie two matrices only differ by a column permutation and will be 
considered to be the "same". Therefore, we expect to identify the equivalence class that Q 
belongs to. Also, note that S(Qi) = S(Q2) if Qi ~ Q2- 

Remark 8 In order to obtain the consistency of Q (subject to a column permutation), it is 
necessary to have p* not living on some sub-manifold. To see a counter example, suppose 
that P(A r = (1, 1) T ) = pi x = 1. Then, for all Q, P(R r = (1, 1) T ) = 1, that is, all 
subjects are able to solve all problems. Therefore, the distribution of R is independent of 
Q. In other words, the Q-matrix is not identifiable. More generally, if there exit A l r and A\ 
such that P(A l r = A? r ) = 1, then the Q-matrix is not identifiable based on the data. This is 
because one cannot tell if an item requires attribute i alone, attribute j alone, or both; see 
[Tby for similar cases for the multidimensional IRT models. 

Remark 9 Note that the estimator of the attribute distribution, p, in (fi~5l) depends on the 
order of columns of Q. In order to achieve consistency, we will need to arrange the columns 
of Q such that Q = Q whenever Q ~ Q. 

Remark 10 One practical issue associated with the proposed procedure is the computation. 
For a specific Q, the computation of S(Q) only involves a constraint minimization of a 
quadratic function. However, if m or k is large, the computation overhead of searching 
the minimizer of S(Q) over the space of m x k matrices could be substantial. One practical 
solution is to break the Q-matrix into smaller sub-matrices. For instance, one may divide the 
m items in to I groups (possibly with nonempty overlap across different groups). Then apply 
the proposed estimator to each of the I group of items. This is equivalent to breaking a big m 
by k Q-matrix into several smaller matrices and estimating each of them separately. Lastly, 
combine the I estimated sub-matrices together to form a single estimate. The consistency 
results can be applied to each of the I sub-matrices and therefore the combined matrix is also 
a consistent estimator. A similar technique has been discussed in Chapter 8.6 of I281J . 



3 DINA model with known slipping and guessing pa- 
rameters 



3.1 Model specification 

In this section, we extend the inference results in the previous section to the situation under 
which the responses do not deterministically depend on the attributes. In particular, we 
consider the DINA (Deterministic Input, Noisy Output "AND" gate) model [12]. We would 
like to introduce two parameters: the slipping parameter (sj) and the guessing parameter 
(<7i). Here 1 — (gi) is the probability of a subject's responding positively to item % given that 
s/he is capable (incapable) of solving that problem. To simplify the notations, we denote 
1 — Si by q. An extension of (j2J) to include slipping and guessing specifies the response 
probabilities as 

p{R = \\C) = 4g]-^ (16) 

where is the capability indicator defined in (pQ). In addition, conditional on j^ 1 , ...,£ m }, 
{R 1 , R m } are jointly independent. 

In this context, the T-matrix needs to be modified accordingly. Throughout this section, 
we assume that both Cj's and g^s are known. We discuss the case that q's are unknown in 
the next section. 

We first consider the case that gi = for all i = 1, m. We introduce a diagonal matrix 
D c . If the h-th row of matrix T C (Q) corresponds to A ... A i*,, then the h-th diagonal 
element of D c is x . . . x Cj ; . Then, we let 

T C (Q) = D C T(Q), (17) 

where T(Q) is the binary matrix defined previously. In other words, we multiply each row 
of T(Q) by a common factor and obtain T C (Q). Note that in absence of slipping (q = 1 for 
each i) we have that T C (Q) = T(Q). 

There is another equivalent way of constructing T C (Q). We define 

B C , Q (I 3 ) = CjBqilj), 

and 

B CiQ (I n A...Al ll ) = T[ =1 B CiQ (I lh ), (18) 

where "T" refers to element by element multiplication. Let the row vector in T C (Q) corre- 
sponding to 4 A ... A I k be B^I^ A ... A I k ). 

For instance, with c = (ci, C2, C3), the T C (Q) corresponding to the T-matrix in (fl2|) would 

be 

/ci ci \ 

c 2 c 2 
c 3 

\ Cl c 2 y 

Lastly, we consider the situation that both the probability of making a mistake and 
the probability of guessing correctly could be strictly positive. By this, we mean that the 



T C (Q) 



(19) 
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probability that a subject responds positively to item i is q if s/he is capable of doing so; 
otherwise the probability is g^. We create a corresponding T c ^ g {Q) by slightly modifying 
T C (Q). We define row vector 

E 



When there is no ambiguity, we omit the length index of E. Now, let 

Bc g , Q (Ii)=g&+(c i -g i )B Q (I i ) 

and 



TcAQ) 



( C1 


9i 


ci \ 


92 


c 2 


c 2 


93 


93 


c 3 



\ Ci^ 2 5 , lC 2 CiC 2 / 



(20) 



B c ,g,Q(Iii A ... A 7jJ = T^ =1 B C!g! Q(J ife ). 

Let the row vector in T Ct9 (Q) corresponding to 7^ A ... A ij, be B^g^I^ A ... A 7,,). For 
instance, the matrix T Cj3 corresponding to the T C (Q) in ffl9|) is 



(21) 



It is common in practice to make the monotonicity assumption that q > g^ for all i; see 
j. For our theoretical development, throughout this section we only impose that c.i ^ g^ 
for all i . This will cover the unusual situation that < gi, which could occur if "a little 
knowledge is harmful." 



3.2 Estimation of the Q-matrix and consistency results 

Having concluded our preparations, we are now ready to introduce our estimators for Q. 
Given c and g, we define 



Sc,g(Q) = inf \ T c,g{Q)p' + PoS ~ a \ 
p'6[0,l] 2fc -l 

where p' = (p' A : A e {0, l} k \ {0}), p' = p'^ , and 



(22) 



/ 9i \ h 



g 



9k 
9i9i 

9k~i9k 
9i9293 



V 



h 

7i A7 2 



(23) 



h-i A I k 
7i A 7 2 A 7 3 



/ ! 



The labels to the right of the vector indicate the corresponding row vectors in T Ct9 (Q). The 
minimization in fl22l) is subject to constraints that 



Pa 



A G[0,1], and £> A = 1. 
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The vector g contains the probabilities of providing positive responses to items simply by 
guessing. We propose an estimator of the Q-matrix through a minimization problem, that 
is, 

Q(c,g) = argmfS c , 9 (Q'). (24) 
w 

We write c and g in the argument to emphasize that the estimator depends on c and g. The 
computation of the minimization in ( )22|) consists of minimizing a quadratic function subject 
to finitely many linear constraints. Therefore, it can be done efficiently 

Theorem 11 Suppose that c and g are known and that conditions C1-C4 are in force. For 
subject r, the responses are generated independently such that 

P(R r = l\£)=cfg^\ (25) 

where is defined as in Theorem® Let Q(c,g) be defined as in ([21]). If Ci 7^ g% for all i, 
then 

lim P(Q(c,g) ~ Q) = 1. 

N— too 

Furthermore, let 

p(c, g) = arg inf \T c>g (Q(c, g))p + p g - «| 2 , 
p 

subject to constraint that ^2 a Pa = 1- Then, with an appropriate rearrangement of the 
columns of Q, for any s > 0, 

lim P(|p(c,(7)-p*|<£) = l. 

N~>oo 

Remark 12 There are various metrics one can employ to measure the distance between the 
vectors T cg (Q(c,g))p + pog and a. In fact, any metric that generates the same topology 
as the Euclidian metric is sufficient to obtain the consistency results in the theorem. For 
instance, a principled choice of objective function would be the likelihood with p profiled 
out. The reason we prefer the Euclidian metric (versus, for instance, the full likelihood) 
is that the evaluation of S(Q) is easier than the evaluation based on other metrics. More 
specifically, the computation of current S(Q) consists of quadratic programming types of well 
oiled optimization techniques. 



4 Extension to the situation with unknown slipping 
probabilities 

In this section, we further extend our results to the situation where the slipping probabilities 
are unknown and the guessing probabilities are known. In the context of standard exams, 
the guessing probabilities can typically be set to zero for open problems. For instance, the 
chance of guessing the correct answer to "(3 + 2) x 2 =?" is very small. On the other hand, 
for multiple choice problems, the guessing probabilities cannot be ignored. In that case, gi 
can be considered as 1/n when there are n choices. 
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4.1 Estimator of c 



We provide two estimators of c given Q and g. One is applicable to all Q- matrices, but 
computationally intensive. The other is computationally easy, but requires certain structures 
of Q. Then, we combine them into a single estimator. 

A general estimator We first provide an estimator of c that is applicable to all Q- 
matrices. Considering that the estimator of Q minimizes the objective function S Ci9 (Q), we 
propose the following estimator of c: 



A moment estimator The computation of c(Q,g) is typically intensive. When the Q- 
matrix has a certain structure, we are able to estimate c consistently based on estimating 
equations. 

For a particular item i, suppose that there exist items i\, %\ (different from i) such that 



that is, the attributes required by item % are a subset of the attributes required by ii, 
Let c g = (ci - gi, c m - g m ) and 



We borrow a result which will be given in the proof of Proposition [221 (Section 16. ip to say 
that there exists a matrix D (only depending on g) such that 



Let a g and a* 3 be the row vectors in D corresponding to 1^ A ... A 7^ and Ii A 1^ A ... A h t 



c(Q,fiO=arg inf S c>g (Q). 



c€[0,l] 



(26) 



B Q (k A 4 A ... A I k ) = B Q (I h A ... A /J, 



(27) 




DT C!9 (Q) = (0,T Cg (Q)). 



( in T Cg (Q)). 
Then, 




B CgtQ {I i M il A...M il )p 
B CgiQ (I h A... A/ i; )p* 



— + O p (l) A (Ci-0i), 



(28) 



where the vectors a g and a* g only depend on g. 

Therefore, the corresponding estimator of q would be 




(29) 
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Note that the computation of Ci(Q,g) only consists of affine transformations and therefore 
is very fast. 

Proposition 13 Suppose conditions C3, (125)) . and (|2"T1) are true. Thenci — > Ci in probability 
as N — >■ oo. 

Proof of Proposition I13L By the law of large numbers, 

< ( i ) - <t c , 9 (Q) ( p* ) ^ °' a 1 ( i ) " ^(Q) ( p* ) ^ °' 

in probability as N — > oo. By the construction of a* 5 and a g , we have 
a*A, g (Q) ( J? ) = A 4 A ... A /Op*, 

aJf c , 9 (Q) ( J?. ) = B Cg , Q (I h A ... A 4)p*. 



Thanks to (1271). we have 













f a 
\ 1 



-> Ci - gi. 



Combined estimator Lastly, we combine and q. For each Q, we write c = (c*,c**). 
For each in the sub-vector c*, (1271) holds. Let c*(Q,g) be defined in (1291) (element by 
element). For c**, we let c**(Q,g) = arginf c ** S(e*(Q, g ),<**),g(Q)- Finally, let c(Q,g) = 
(c* (Q , g) , c** (Q , g)) . Furthermore, each element of c(Q,g) greater than one is set to be 
one and each element less than zero is set to be zero. Equivalently, we impose the constraint 
that c(Q,g) G [0, l] m . 

4.2 Consistency result 

Theorem 14 Suppose that g is known and the conditions in Theorem [7JJ hold. Let 
Qc(g) = arginf S £[Q ^ g)yg (Q'), p £ (g) = arginf \T £[ Q g)g (Q £ (g))p + p g 

The second optimization is subject to constraint that ^2 a Pa = 1- Then, 

lim P (Qt{g) ~ O) = 1. 



a \. 



Furthermore, if the estimator c(Q , g) , defined in (]26[) , is consistent, then by appropriately 
rearranging the columns of Q £ (g), for any e > 0, 

lim P(\p £ (g)-p*\<s) = l. 

7V-s>oo 
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Remark 15 The consistency of Qc(g) does not rely on the consistency of c(Q,g), which 
is mainly because of the central intermediate result in Proposition The consistency of 
c(Q,g) is a necessary condition for the consistency ofpz(g). 

For most usual situations, (p*, c) is estimable based on the data given a correctly specified 
Q-matrix. Nonetheless, there are some rare occasions in which nonidentifiability does exist. 
We provide one example, explained at the intuitive level, to illustrate that it is not always 
possible to consistently estimate c and p*. This example is simply to justify that the existence 
of the consistent estimator for c in the above theorem is not an empty assumption. Consider 
a complete matrix Q = T^. The degrees of freedom of a k-way binary table is 2 k — 1. On the 
other hand, the dimension of parameters (p*,c) is 2 k — 1 + k. Therefore, p* and c cannot 
be consistently identified without additional information. This problem is typically tackled by 
introducing addition parametric assumptions such as p* satisfying certain functional form or 
in the Bayesian setting (weakly) informative prior distributions Jfljj. Given that the emphasis 
of this paper is the inference of Q-matrix, we do not further investigate the identifiability of 
(p*,c). Nonetheless, estimation for (p*,c) is definitely an important issue. 



Remark 16 Assuming that the guessing probability gi being known is somewhat strong. For 
complicated situations, such as for multiple choice problems the incorrect choices do not look 



"equally incorrect", the guessing probability is typically not 1/n. In Theorem 1J_, we make 
this assumption mostly for technical reasons. 

One can certainly provide the same treatment to the unknown guessing probabilities just 
as to the slipping probabilities by plugging in a consistent estimator of gi or profiling it out 
(like c). We believe that for most cases such a procedure delivers a consistent estimator of 
the Q-matrix. However, we leave the rigorous analysis of the problem with unknown guessing 
probability to the future study. 



5 Discussion 

This paper provides basic theoretical results of the estimation of Q-matrix, a key element in 
modern cognitive diagnosis. Under the conjunctive model assumption, sufficient conditions 
are developed for the Q-matrix to be identifiable up to an equivalence relation and the 
corresponding consistent estimators are constructed. The equivalence relation defines a 
natural partition of the space of Q-matrices and may be viewed as the finest "resolution" that 
is possibly distinguishable based on the data, unless there is additional information about the 
specific meaning of each attribute. Our results provide the first steps for statistical inference 
about Q-matrices by explicitly specifying the conditions under which two Q-matrices lead 
to different response distributions. We believe that these results, especially the intermediate 
results in Section El can also be applied to general conjunctive models. 

There are several directions along which further exploration may be pursued. First, 
some conditions may be modified to reflect practical circumstance. For instance, if the 
population is not fully diversified, meaning that certain attribute profiles may never exist, 
then condition C4 cannot be expected to hold. To ensure identifiability, we will need to 
impose certain structures on the Q-matrix. In the addition-multiplication example of Section 
12. 3\ if individuals capable of multiplication are also capable of addition, then we may need 
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to impose the natural constraint that every item that requires multiplication should also 
require addition, which also implies that the Q-matrix is never complete. 

Second, when an a priori "expert's" knowledge of the Q-matrix is available, we may wish 
to incorporate such information into the estimation. This could be in the form of an additive 
penalty function attached to the objective function S. Such information, if correct, not only 
improves estimation accuracy but also reduces the computational complexity - one can just 
perform a minimization of S(Q) in a neighborhood around the expert's Q-matrix. 

Third, throughout this paper we assume that the number of attributes (dimension) is 
known. In practice, it would be desirable to develop a data driven way to estimate the 
dimension, not only to deal with the situation of unknown dimension, but also to check if 
the assumed dimension is correct. One possible way to tackle the problem is to introduce a 
penalty function similar to that of BIC [23] which would give a consistent estimator of the 
Q-matrix even if the dimension is unknown. 

Fourth, one issue of both theoretical and practical importance is the inference of the 
parameters additional to the Q- matrix, such as the slipping (s = 1 — c), guessing (g) param- 
eters and the attribute distribution p*. In the current paper, given that the main interesting 
parameter is the Q-matrix, the estimations of p* and c are treated as by-product of the main 
results. On the other hand, given a known Q, the identifiability and estimation of these pa- 
rameters are important topics. In the previous discussion, we provided a few examples for 
potential identifiability issues. Further careful investigation is definitely of great importance 
and challenges. 

Fifth, the rate of convergence of the estimator Q. This topic is not only of theoretical 
importance. From practical point of few, it is crucial to study the rate of convergence as the 
scale of the problem becomes large such as the number of attributes and number of items. 

Lastly, the optimization of S(Q) over the space of m x k binary matrices is a nontrivial 
problem. It consists of evaluating the function S 2 mxk times. This is a substantial compu- 
tational load if m and k are reasonably large. As mentioned previously, this computation 
might be reduced by additional information about the Q-matrix or splitting the Q-matrix 
into small sub-matrices. Nevertheless, it would be highly desirable to explore the structures 
of the Q-matrix and the function S so as to compute Q more efficiently. 

6 Proofs of the theorems 

6.1 Several propositions and lemmas 

To make the discussion smooth, we postpone several long proofs to Appendix |A] 

Proposition 17 Suppose that Q is complete and matrix T(Q) is saturated. Then, we are 
able to arrange the columns and rows of Q and T(Q) such that T(Q)i : (2fc-i) has full rank and 
T(Q) has full column rank. 

Proof of Proposition 1171 Provided that Q is complete, without loss of generality we 
assume that the z-th row vector of Q is ej for i = l,...,k, that is, item % only requires 
attribute % for each % = 1, k. The first 2 k — 1 rows of T(Q) are associated with {Ji, I^}. 
In particular, we let the first k rows correspond to I 1} I k and the first k columns of T(Q) 
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correspond to A's that only have one attribute. We further arrange the next C\ rows of 
Tiff) to correspond to combinations of two items, h A Ij, i ^ j. The next C| columns of 
Tiff) correspond to A's that only have two positive attributes. Similarly, we arrange Tiff) 
for combinations of three, four, and up to k items. Therefore, the first 2 k — 1 rows of T(Q) 
admit a block upper triangle form. In addition, we are able to further arrange the columns 
within each block such that the diagonal matrices are identities, so that T(Q) has form 



h, hi- 


(Ik 


* 


* * . . 


. \ 


ll A h, h A h, ■■■ 







* * 




hAhAh,- 








T(jk * 












/ 



(30) 



T(ff)x-.(2 k -i) obviously has full rank and therefore Tiff) has full column rank. ■ 

From now on, we assume that Q\± = and the first 2 k — 1 rows of T{Q) are arranged 
in the order as in fl30l . 

Proposition 18 Suppose that Q is complete, T{Q) is saturated, and c ^ 0. Then, T c iff) 
and r c ((5)i:(2fc-i) have full column rank. 

Proof of Proposition I18L By Proposition [T71 ( JTT1) and the fact that D c is a diagonal 
matrix of full rank as long as c ^ 0, 

T C (Q) = D C T(Q), 

is of full column rank. ■ 

The following two propositions, which compare the column spaces of Tjff) and T C (Q'), 
are central to the proof of all the theorems. Their proofs are delayed to the appendix. 

The first proposition discusses the case where Q[. k is complete. We can always rearrange 
the columns of Q' so that Q\± = Q'\-k- 111 addition, according to the proof of Proposition 
[TTJ the last column vector of T C (Q) corresponds to attribute A = (1, 1) T . Therefore, this 
column vector is all of nonzero entries. 

Proposition 19 Assume that Q is a complete matrix and T{Q) is saturated. Without loss 
of generality, let Qi : fc — T^. Assume that the first k rows of Q' form a complete matrix. 
Further, assume that Q\-k = Q'\ \. = T^. Let V* denote the last column vector ofT c iff). If 
Q' Q and c^ 0, then there exists at least one column vector ofT c iQ) (independent of c), 
denoted by V G Ct c (q), such that either V or V* is not in the column space C(T c r(Q')) for 
alld g R m . 

The next proposition discusses the case where Q[. k is incomplete. 

Proposition 20 Assume that Q is a complete matrix and T{Q) is saturated. Without loss 
of generality, let Qi.k = T^. Let V* denote the last column ofT c (Q). If c ^ and Q[. k is 
incomplete, there exists at least one column vector ofTjff) (independent of c), denoted by 
V G Ctjq), such that either V or V* is not in the column space C{T C /{Q')) for all d G [0, l] m . 
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The next result is a direct corollary of these two propositions. It follows by setting Cj — 1 
and gi = for all z = 1, m. 

Corollary 21 Suppose that conditions CI and C2 are in force. If Q oo Q' , then C{T{Q)) 
and C(T(Q')) are different. 

To obtain a similar proposition for the cases where the g^s are non-zero, we will need to 
expand the T c ^ g (Q) as follows. As previously defined, let 

f c>g (Q) = ( \ Tc { Q) ) • (si) 

The last row of T Cj5 (Q) consists entirely of ones. Vector g is defined as in fl23|) . 

Proposition 22 Suppose that Q is a complete matrix, Q' ^ Q, T is saturated, and c ^ g. 
Let V* denote the last column ofT c ^ g (Q). Then, there exists one column vector of T c>g (Q) 
(independent of c), denoted by V, such that either V or V* is not in C(T c i g (Q')) for all 
d G [0, l] m . In addition, T cg (Q) is of full column rank. 

To prove Proposition [22} we will need the following lemma. 

Lemma 23 Consider two matrices 7\ and T 2 of the same dimension. If C(Ti) C C(T 2 ), 
then for any matrix D of appropriate dimension for multiplication, we have 

C(DTi) C C(DT 2 ). 

Conversely, if the l-th column vector of DT\ does not belong to C(DT 2 ), then the l-th 
column vector ofT\ does not belong to C(T 2 ). 

Proof of Lemma 1231 Note that DTi is just a linear row transform of Tj for i — 1,2. The 
conclusion is immediate by basic linear algebra. ■ 

Proof of Proposition 1221 Thanks to Lemma [231 we only need to find a matrix D such 
that there exists one column vector V D of DT c g {Q) such that either V D or the last column 
vector of DT cg {ff) does not belong to the column space of DT c i g {Q') for all c' G [0, l] m . 
We define 

c g = (ci - gi, c m - g m ), c' g = (c[ - #1, c' m - g m ). 
We claim that there exists a matrix D such that 

Df c , g (Q) = ( 0, T Cg (Q) ) 

and 

Df c , <g (Q') = ( 0, T C ,{Q>) ), 

where the choice of D does not depends on core'. In the rest of the proof, we construct such 
a D-matrix for T Cig (Q). The verification for T c ^ g (Q') is completely analogous. Note that 
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each row in DT cg (Q) is just a linear combination of rows of T c g (Q). Therefore, it suffices to 
show that every row vector of the form 

(0,B,, Q (/ il A...A/ il )) 

can be written as a linear combination of the row vectors of T cg (Q). We prove this by 
induction. First note that for each 1 < % < m, 

(0,B CgjQ (k)) = (ci-g^Bgili)) = (g h B c ^ Q {h)) - 9i E. (32) 

Suppose that all rows of the form 

(0,B %i g(/ il A...A/ il )) 

for all 1 < I < j can be written as linear combinations of the row vectors of T c<g (Q) with 
coefficients only depending on gi, ...,g m . Thanks to (152"]) . the case of j = 1 holds. Suppose 
the statement holds for some general j. We consider the case of j + 1. By definition, 

(g h ...g lj+1 ,B Ctgt Q(I h A ... A = T J h t\ (9i h , B c>g>Q (I ih )) (33) 

Let "*" denote element-by-element multiplication. For every generic vector V of appropriate 
length, 

E * V = V. 

We expand the right hand side of (133]) . The last term would be 

(0,B Cg , Q (L tl A ... A = V h t\ (0,B Cg>Q (I ih )) , 

From the induction assumption and definition (j!8j) . the other terms on both sides of (133]) 
belong to the row space of T Ct9 (Q). Therefore, (0,5^^(7^ A ... A Ii J+1 )) is also in the row 
space of T Ct9 (Q). In addition, all the corresponding coefficients only consist of g^. Therefore, 
one can construct a (2 m — 1) x 2 m matrix D such that 

Df Cj9 (Q) = ( 0, T Cg (Q) ) . 

Because D is free of c and Q, we have 

Df c , >g (Q') = ( 0, T C ,(Q') ). 

In addition, thanks to Propositions [T9l and [20l there exists a column vector in DT cg (Q), 
Vd, such that either the last column vector or Vd does not belong to the column space of 
DT c ' g (Q') for all d 6 [0, l] m . Therefore, by Lemma [23l the corresponding column vector of 
T c , g (Q) is not in the column space of T c i >g (Q'). 
In addition, 




is of full column rank, where ejm is a 2 m dimension row vector with last element being one 
and rest being zero. Therefore, T c g (Q) is also of full column rank. ■ 
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6.2 Proof of the theorems 



Using the results of the previous propositions and lemmas, we now proceed to prove our 
theorems. 

Proof of Theorem [6j Consider Q' oo Q and T(-) saturated. According to Corollary |2T| 
the column spaces of T(Q) and T(Q') are different. There exists at least one column vector 
V G Ct(q) such that V ^ C(T(Q')). Recall that p is the vector containing j3 A 's with A ^ 0, 
where 

1 N 

r=l 

For any p* y 0, since p -»■ p* almost surely, a = T(Q)f>, and T(Q)p* £ C(T(Q')), there 
exists 5 > such that, 

lim P ( inf |T(Q')P - "I > S ) = 1 

N^oo y P 6[o,i] 2fc - 1 / 

and 

P ( inf \T(Q)p-a\ = | =1. 

Vpeio,!] 2 '- 1 / 

Given that there are finitely many m x k binary matrices, P(Q ~ Q) — > 1 as N — > oo. In 
fact, we can arrange the columns of Q such that P(Q = Q) — > 1 as N — > oo. 
Note that p satisfies the identity 

T(Q)p = a. 

In addition, since T(Q) is of full rank (Proposition [17)1 . the solution to the above linear 
equation is unique. Therefore, the solution to the optimization problem inf p \T(Q)p — a\ is 
unique and is p. Notice that when Q = Q, p = arginf p \T(Q)p — a\ = p. Therefore, 

lim P(p = p) = 1. 

N— >oo 

Together with the consistency of p, the conclusion of the theorem follows immediately. ■ 
Proof of Theorem [HQ Note that for all Qf 

T c , g (Q')p + Pog - a 

By the law of large numbers, 

\Tc, g (Q)p* + PoE-a\ = 

almost surely as N — > oo. Therefore, 

Sc, g (Q) ~+ 



(s,T c , g (Q')) 



Po 
P 



a. 
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almost surely as N — > oo. 
For any Q' ^ Q, note that 



According to Proposition [25] and the fact that p* y 0, there exists S(c') > such that 6(d) 
is continuous in c' and 



inf 

P,Po 



Po 

p 



TcAQ) 



Po 
P* 



> 5(c'). 



By elementary calculus, 



5 = inf <5(c') > 
c'e[o,i] m 



and 



inf 

c',p,po 



Td,g{Q') 



Po 

P 



p* 



> 5. 



Therefore, 

P ( inf T c , 9 (Q') 
as N — > oo. For the same 5, we have 



P 



a 
1 



> 5/2 1 



P( inf 

c',p,p 



■ ( p ) " « > V2) = P(mf ^ )5 (Q') > 5/2) 1. 



The above minimization on the left of the equation is subject to the constraint that 

£ *-A = i. 

Ae{o,i} fc 

Together with the fact that there are only finitely many m x k binary matrices, we have 

P{Q(c,g)~Q) = \. 

We arrange the columns of Q(c,g) so that P(Q(c,g) = Q) — > 1 as N — > oo. 
Now we proceed to the proof of consistency for p(c, g). Note that 



T c , g (Q(c,g)) 

t, 9 (Q) 



Po{c,g) 



Po 

p 



a 
1 

a 
1 



A o, 



A o. 



Since T Cj9 (Q) is a full column rank matrix and P(Q(c,g) — Q) — > 1, p(c, g) — >■ p* in 
probability. ■ 

Proof of Theorem 1141 Assuming g is known, note that 



inf 

PO,P 



T C , 9 (Q) 



Po 
P 



a 
1 
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is a continuous function of c. According to the results of Proposition 
and the definition of c in Section 14.11 we obtain that 



the definition in 



inf 

PO,P 



T c(Q,g),g{Q) 



Po 

P 



a 
1 



^0, 



in probability as N — > oo. In addition, thanks to Proposition |22] and with a similar argument 
as in the proof of Theorem [TTJ Qc{g) is a consistent estimator. 

Furthermore, if c(Q,g) is a consistent estimator, then c(Q,g) is also consistent. Then, 
the consistency of pe,(g) follows from the facts that Qc{g) is consistent and T^ g (Q) is of full 
column rank. ■ 



A Technical proofs 



Proof of Proposition 1191 Note that Qi± = Q[. k = X&. Let T(-) be arranged as in (13(71) . 
Then, T{Q) v ^-\) = ^(Q')i:(2 fe -i)- Given that Q ^ Q', we have T(Q) ^ T(Q'). We assume 
that T(Q)u ^ T(Q')u, where T(Q)u is the entry in the Z-th row and z-th column. Since 
T(Q)i;(2 k -i) — T(Q') i:(2fc-i), it is necessary that I > 2 k . 

Suppose that the l-th row of the T(Q r ) corresponds to an item that requires attributes 
z'i, iy. Then, we consider 1 < h < 2 k — 1, such that the h-th row of T(Q') is Bq^I^ A ... A 
L,). Then, the h-th row vector and the l-th row vector of T(Q') are identical. 

Since T(Q) 1:(2fc _ 1) = T(Q')i:(2*-i), we have T(Q) hj = T(Q% = T(Q% for j = 1, ...,2 k - 
1. If T(Q)ji = and T(Q') H = 1, the matrices T(Q) and T(Q') look like 



T{Q') 



row /i — ?• 



row / — > 



column % 
I 

/ X * ... 

; ; x 

* i * 



and 



T(Q) 



row h — > 



row / — > 



column i 

I 

/ X * ... 



; ; x 

* o * 
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Case 1 The h-th and l-th row vectors of T c r(Q') are nonzero vectors. We claim that the i-th 
column vector of T C (Q) is not in the column space of T c i(Q'). This is because of the 
following two facts. 

First, since T(Q)hi ^ T(Q)u, one of T c (Q)hi and T c (Q)u is zero and the other is non- 
zero. 

Second, because the h-th row and the l-th row of T(Q') are identical, the h-th and 
l-th entries of all column vectors of T(Q') are identical. So it is with all vectors in the 
column space of T(Q'). Furthermore, for every vector in the column space of T c i(Q'), 
its h-th and l-th entries are either both zero or both non-zero. 

Using these two facts, we obtain that the i-th column vector of T C (Q) is not in the 
column space of T c r(Q'). 

Case 2 Either the h-th or l-th row vector of T C /(Q') is a zero vector. The last column of 
T C (Q), V*, is not in the column space of T C /(Q'). This is because all elements of V* are 
nonzero. 

■ 

Proof of Proposition I20L T(-) is arranged as in (1301) . As in the proof of Proposition [T9l 
it is sufficient to show that there exist two row vectors (h and I) in T(Q') which are identical, 
that is, T(Q')hj = T(Q')ij for j = 1, 2 k — 1, while the h-th and l-th row vectors in T(Q) 
are different. Once we have identified two such row vectors, we have the following two cases. 

Case 1 The h-th and l-th row vectors of T c i(Q') are nonzero vectors. For all vectors in the 
column space of T C /(Q'), the h-th and l-th elements are either both zero or both non- 
zero. Then, there exists a column vector of T C (Q) such that one of its h-th and l-th 
elements is zero and the other is non-zero. Therefore, this column vector is not in the 
column space of T C /(Q'). 

Case 2 Either the h-th or the l-th row vector of T C >(Q') is a zero vector. The last column of 
T C (Q), V*, is not in the column space of T C >(Q'). 

In what follows, we identify two such row vectors. It turns out that we only need to 
consider the first k items. Consider Q' such that Q[. k is incomplete. We discuss the following 
situations. 

1. There are two row vectors, say the i-th and j-th row vectors (1 <i,j < k), in Q[. k that 
are identical. Equivalently, two items require exactly the same attributes according to 
Q' . Then, the row vectors in T(Q') corresponding to these two items are identical. All 
of the first 2 k — 1 row vectors in T(Q) must be different, because T(Q)i : (2fc-i) has rank 
2 k - 1. 

2. No two row vectors in Q\. k are identical. Then, among the first k rows of Q' there is 
at least one row vector containing two or more non-zero entries. That is, there exists 
1 < i < k such that 

3=1 
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This is because if each of the first k items requires only one attribute and Q[. k is not 
complete, there are at least two items that require the same attribute. Then, there are 
two identical row vectors in Q[. k and it belongs to the first situation. We define 

k 

the number of attributes required by item i according to Q'. 

Without loss of generality, assume a.; > 1 for i = 1, n and a« = 1 for i = n + 1, k. 
Equivalently, among the first k items, only the first n items require more than one 
attribute while the (n + l)-th through the k-th items require only one attribute each, 
all of which are distinct. Without loss of generality, we assume Q' u = 1 for i = n+1, ...,k 
and Qij — for i — n + 1, k and i ^ j. 

(a) n = 1. Since a± > 1, there exists i > 1 such that Q' u = 1. Then, the row vector 
in T{Q') corresponding to I\ A l; L (say, the /-th row in T(Q')) and the row vector 
of T(Q') corresponding to l\ are identical. On the other hand, the first row and 
the Z-th. row are different for T(Q) because T(Q)i.( 2 k -i) is a full-rank matrix. The 
above statement can be written as 

B Q ,(h A Zj) = B Q ,(h), B Q (h A h) ^ B Q {h). 

(b) n > 1 and there exists j > n and i < n such that Q'^ = 1. Then by the same 
argument as in (|2al) . we can find two rows that are identical in T(Q') but different 
in T{Q). In particular, 

Bcyilj A h) = Bcyih), B Q {Ij A h) ± B Q {h). 

(c) n > 1 and for each j > n and i < n, Q'^ = 0. Let the i*-th row in T(Q') corre- 
spond to 7]A, Al n . Let the i* h -th row in T{Q') correspond to Alh-i A 
4+iA, Al n for h=l, n. 

We claim that there exists an h such that the i*-th row and the i^-th row are 
identical in T(Q'), that is 

B Q/ (hA, A4_i A 4 +1 A, Al n ) = B Q ,{hA, A/„). (34) 

We prove this claim by contradiction. Suppose that there does not exist such an 
h. This is equivalent to saying that for each j < n there exists an a,j such that 
Qj a . = 1 and Q' ia . = for all 1 < i < n and i ^ j. Equivalently, for each j < n, 
item j requires at least one attribute that is not required by other first n items. 
Consider 

Ci = {j : there exists i < i' < n such that Q' it ,- = 1}. 

Let #(•) denote the cardinality of a set. Since for each i < n and j > n, Q'^ = 0, 
we have that #(Ci) < n. Note that Q' lai = 1 and Q' iai = for all 2 < i < n. 
Therefore, a\ G C\ and a\ ^ Therefore, #((^2) < n — 1. By a similar argument 
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and induction, we have that a n = #(C n ) < 1. This contradicts the fact that 
a n > 1. Therefore, there exists an h such that (13"4"|) is true. As for T(Q), we have 
that 

B Q (hA, A4_i A 4+iA, .., M n ) ^ B Q (hA, A/ n ). 

In summary, there exist at least two rows in T(Q / )i:(2 fe -i) t na t are identical, while the corre- 
sponding rows in T(Q) 1 .j 2 k^u are different. Thus, we conclude the proof. 
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